The idea of the analysis of the datasets smashy_lcbench and smashy_super is to understand the dependencies between the hyperparameters and the target variable yval, using the implemented plots from the VisHyp package and, most importantly, without the help of any automatic optimization. We want to understand which parameter is important, i.e. has a large impact on the result, which parameter needs to be set more precisely and for which parameter the value is almost irrelevant. Furthermore, we want to understand the dependencies between the parameters themselves. Finally, we want to compare the results of the two datasets.
For each dataset, we want to examine the entire dataset and the best 20% of the yval values to get a more detailed insight into the configurations of the best results. We will partition our data with the bounded range per parameter to obtain a subset of configurations with good yval values. We will also look at this constrained parameter range using PCPs.
We will use Importance Plots, Partial Dependence Plots (PDP), Heatmaps, and Parallel Coordinate Plots (PCP) to analyze the data. Importance plots provide the most important parameters. For a quick overview, we will use heatmaps. For a deeper insight into the boundary structure as well as for dependencies between 2 parameters we will then use Partial Dependence Plots (PDP). Only when the dataset has been reduced in size, we can also use Parallel Coordiante Plots (PCP) to get a good impression about parameter configurations. In addition, we will look at the data using Summaries to draw further conclusions.
This analysis is structured as follows, first the treated dataset is prepared, so that one can use it for analyses. Then, the analysis is performed and the results are used to suggest good configuration ranges for each parameter. The analyses and deeper insights into the analyses of each parameter, can be selected in the Table of Contents (TOC) on the left. Prior to this chapter, an overview of the dataset is provided. Finally, the results of the two datasets are compared.
We need to load packages and subdivide the data to compare the whole dataset and the dataset with the 20% of configurations with the best result. In addition, the data must be manipulated to facilitate the use of the data for summaries and filters.
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
All plots from the VisHyp package require an mlr3 task object as input. Therefore, a mlr3 task with the selected target is required.For lcbench, the target is yval, a logloss performance measure. Values near 0 indicate good performance.
lcbenchTask <- TaskRegr$new(id = "task_lcbench", backend = lcbenchSmashy, target = "yval")
lcbenchBest <- lcbenchSmashy[lcbenchSmashy$yval >= quantile(lcbenchSmashy$yval, 0.8),]
bestTask <- TaskRegr$new(id = "bestTask", backend = lcbenchBest, target = "yval")
The target parameter yval can reach values between -0.9647 and -0.4690. Our goal is to obtain good results, i.e., to find configurations that produce values close to -0.4690.
The most important parameter is sample. It should always be chosen “bohb” and not “random”, because 2130 of the best 2143 configurations were created with this factor and the average effect on yval is much larger when “bohb” is chosen.
The next very important parameter is the survival_rate. It can be seen that a low value is better on average, but high values can also be good for the best configurations. A value between 0.15 and 0.5 should be chosen for a high average performance without any further limitation. If a surrogate_Learner is selected, the constraint of the parameter should be chosen according to the selected surrogate_Learner.
Even though the surrogate_learner parameter is not that important, it influences most other parameters. This means that other parameter values should be set depending on the selected surrogate_learner if they have different effects on the performance measure. An indication that the surrogate_learner parameter has a large impact on the other parameters was given by the Importance Plot for the partial datasets split by surrogate_learner. This assigned different importance to the individual parameters, depending on the subset selected. This is especially noticeable for “bohb” samples. Parameters that should be selected depending on the chosen surrogate_learner are listed below. However, there are also findings of which surrogate_learner gives the best results: In the full dataset, surrogate_learner knn1 or knn7 showed the best performance and ranger the worst. For the top cases, we saw that many bohblrn and rangers were filtered out in disproportionate numbers. Surprisingly, bohblrn turned out to be the level of greatest importance.
knn1: survival_fraction should get a value above 0.5 if we are interested in the best cases. For the whole dataset, the best cases were on average below 0.5 random_interleave_fraction should be low and have a value between 0.05 and 0.5 according to the complete dataset. budget_log_step should be chosen between -0.5 and 0.5. filter_factor_first should get a value under 4. filter_select_per_tournament should get a value over 0.9.
knn7: filter_factor_first should be under 4. survival_fraction should be between 0.1 and 1 according to both, the full dataset and the subset. budget_log_step produces good performances for values between -0.5 and 1 but has not a big impact in general. random_interleave_fractionshould be between 0.25 and 0.75 according to the full dataset. In the subset it doesn’t matter. random_interleave_random should be “FALSE”. filter_select_per_tournament should be over 0.5.
bohblrn: random_interleave_fraction better if lower. A good valuer should be between 0.05 and 0.65. survival_fraction lower is better in the full dataset but it doesnt matter for the best configurations budget_log_step it is hard to tell because of fluctuation but should be at least over -1.5. filter_algorithm should be “progressive”. filter_factor_last should be over 5. filter_factor_first should not be restricted.
ranger: random_interleave_fraction should be over 0.25. survival_fraction should be under 0.75. budget_log_step should be over -1.5.
Another important parameter for the general case is the random_interleave_fraction parameter. We have found that in general low values under 0.3 are better for “random” samples, and values between 0.1 and 0.75 are better for “bohb” samples. But this is only the case because it depends on surrogate_learner, and diser has many observations for levels knn1 and knn7. For these levels, a low value must be chosen to get a good result. For the “bohb” sample, values in the middle are better and for “ranger” high values achieve the best yval values. For the top cases, the parameter lost importance. This could be because the counter case with “random” samples are almost completely filtered out. The level factor did not change the behavior for the top case (for bobhlrn, the middle range is not so important anymore).
The second most important parameter for “bohb” sampling is the budget_log_step parameter. For the full dataset this parameter should be set between -0.5 and 1, but when choosing a surrogate_learner the parameter should be set according to this parameter.
filter_with_max_budget is not an important in general but should always be set to “TRUE” and is more important for “bohb” samples. Anyway, the effect is important for the surrgoate_learner “bohblrn” in top cases.
filter_factor_first is the most important parameter for the top 20%. It also has a higher importance in “random” samples than in “bohb” samples. In general it should be low (under 4) for “bohb” samples and high (near to 6) for “random” samples. The parameter filter_factor_first should not be restricted if the surrogate_learner is “bohblrn.”
filter_factor_last The effect is low and shouldn’t be used to subdivide the dataset in general.
filter_select_per_tournament shouldn’t be too high in general case but doesnt really matter for good results.
filter_algorithm and random_interleave_random have hardly any effect and can be left out for deeper investigations. Only for surrogate learner the factor “bohblrn” should be considered.
To verify the proposed parameter configurations, we constrain the dataset and compare the obtained performance with the ranks of the performance of the whole dataset.
lcbenchEvaluation <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$surrogate_learner == "bohblrn",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$random_interleave_fraction > 0.05 & lcbenchEvaluation$random_interleave_fraction < 0.65,]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$budget_log_step > -1.5,]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_with_max_budget == "TRUE",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_algorithm == "progressive",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_factor_last > 5,]
lcbenchYval <- sort(lcbenchEvaluation$yval, decreasing = TRUE)
lcbenchYvalOriginal <- sort(lcbenchSmashy$yval, decreasing = TRUE)
sort(match(lcbenchYval, lcbenchYvalOriginal), decreasing = FALSE)
## [1] 5 6 11 12 13 16 17 21 23 25 26 28 29 30 35
## [16] 37 44 47 53 55 57 64 75 82 84 112 117 129 139 178
## [31] 182 214 247 1153 2896 3181 3944 3961 5161 5997 6095 6635 6953 7318 7450
## [46] 7707 7930 8208 8212
We can see that many good results were obtained, but not nearly all of the best configurations were found out. This can be explained by the fact that we often imposed constraints to reduce the size of the dataset. For example, for some categorical parameters, we always chose one factor even though we knew that other categories could also yield good values. Furthermore, numerical parameters were partly restricted, although it was known that for some very good configurations, very good yval values can also be obtained outside the range.
Most interestingly, we get many good results, but also some seemingly bad ones. This could be due to hidden interactions that were not found, or inaccuracies in the constraints placed on the parameters by the visualization plots. In the second possibility, the poorer performance values could be due to errors in the interpretation of the plots. But also difficulties with the surrogate model could be decisive if predicted values of the performance values are not determined correctly. In addition, an inappropriate grid size in a PCP can lead to inaccuracies.
Finally some metrics are used to verify the results. The importance of the metrics can be found in the bachelor thesis.
summary(lcbenchEvaluation$yval)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.6003 -0.5262 -0.4798 -0.5026 -0.4748 -0.4713
#proportion
length(lcbenchEvaluation$yval)/length(lcbenchSmashy$yval)
## [1] 0.004574309
#top congifuration
sum(lcbenchYval >= quantile(lcbenchSmashy$yval, 0.95))/length(lcbenchYval)
## [1] 0.6734694
#quantile
sum(lcbenchSmashy$yval<=max(lcbenchYval))/length(lcbenchSmashy$yval)
## [1] 0.9996266
With the implemented PCP our Results can be visually checked.
knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Best_PCP.png")
knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Bad_PCP.png")
For visual analysis it is important to know the configuration spaces and the class of parameters.
head(lcbenchSmashy)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## 1 0.11449875 0.26100298 knn7 FALSE
## 2 -0.42921649 0.33760502 knn7 TRUE
## 3 0.04823162 0.01486055 knn7 TRUE
## 4 0.85378442 0.73223279 bohblrn TRUE
## 5 -1.45588046 0.85519272 knn7 TRUE
## 6 -0.45467437 0.11901165 knn1 FALSE
## filter_factor_first random_interleave_fraction random_interleave_random
## 1 0.2337803 0.2254148 TRUE
## 2 3.7563675 0.1042924 TRUE
## 3 1.0023879 0.5424223 FALSE
## 4 0.4368656 0.4891884 FALSE
## 5 0.6717368 0.5157025 FALSE
## 6 0.7571962 0.7391276 FALSE
## sample filter_factor_last filter_algorithm filter_select_per_tournament
## 1 bohb 0.3870927 progressive 2.2749194
## 2 random 1.5890745 progressive 2.2996638
## 3 random 2.9274948 progressive 1.9313954
## 4 bohb 5.7753986 progressive 1.5170413
## 5 bohb 6.4220781 tournament 0.5100007
## 6 random 2.9316765 progressive 0.4911047
## yval
## 1 -0.4989768
## 2 -0.5345810
## 3 -0.5401640
## 4 -0.4748074
## 5 -0.5058519
## 6 -0.5397687
str(lcbenchSmashy)
## 'data.frame': 10712 obs. of 12 variables:
## $ budget_log_step : num 0.1145 -0.4292 0.0482 0.8538 -1.4559 ...
## $ survival_fraction : num 0.261 0.3376 0.0149 0.7322 0.8552 ...
## $ surrogate_learner : Factor w/ 4 levels "bohblrn","knn1",..: 3 3 3 1 3 2 1 2 3 3 ...
## $ filter_with_max_budget : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 2 2 1 1 1 2 1 ...
## $ filter_factor_first : num 0.234 3.756 1.002 0.437 0.672 ...
## $ random_interleave_fraction : num 0.225 0.104 0.542 0.489 0.516 ...
## $ random_interleave_random : Factor w/ 2 levels "FALSE","TRUE": 2 2 1 1 1 1 1 1 2 1 ...
## $ sample : Factor w/ 2 levels "bohb","random": 1 2 2 1 1 2 1 2 2 2 ...
## $ filter_factor_last : num 0.387 1.589 2.927 5.775 6.422 ...
## $ filter_algorithm : Factor w/ 2 levels "progressive",..: 1 1 1 1 2 1 1 1 2 2 ...
## $ filter_select_per_tournament: num 2.27 2.3 1.93 1.52 0.51 ...
## $ yval : num -0.499 -0.535 -0.54 -0.475 -0.506 ...
We want to look at the importance for the whole dataset (general case) and for the best configurations (top 20%).
plotImportance(lcbenchTask)
plotImportance(bestTask)
For the general case, sample is the most important hyperparameter. The random_interleave_random parameter is of little importance. For the best configurations, filter_factor_first and filter_factor_last are the most important parameters and the sample parameter is no longer of importance. The ranking of the parameters has changed a lot, but the value of the importance measure has hardly changed for the parameters except for the sample parameter. We also look at a PCP:
plotParallelCoordinate(lcbenchTask)
It can be seen that there are too many observations to see much. The PCP makes more sense with fewer observations. After dividing the data, we first look for structural changes.
summary(lcbenchSmashy)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## Min. :-1.7528 Min. :0.0000686 bohblrn:1372 FALSE:4801
## 1st Qu.:-1.0795 1st Qu.:0.1877029 knn1 :3111 TRUE :5911
## Median :-0.4192 Median :0.3602689 knn7 :4803
## Mean :-0.3839 Mean :0.4179906 ranger :1426
## 3rd Qu.: 0.3110 3rd Qu.:0.6339252
## Max. : 1.0196 Max. :0.9998031
## filter_factor_first random_interleave_fraction random_interleave_random
## Min. :0.000763 Min. :0.0000227 FALSE:5008
## 1st Qu.:2.791122 1st Qu.:0.1496729 TRUE :5704
## Median :4.452371 Median :0.3419693
## Mean :4.139002 Mean :0.3893602
## 3rd Qu.:5.690380 3rd Qu.:0.6082803
## Max. :6.907525 Max. :0.9999744
## sample filter_factor_last filter_algorithm
## bohb :8763 Min. :0.000763 progressive:3882
## random:1949 1st Qu.:2.462215 tournament :6830
## Median :4.267029
## Mean :3.960315
## 3rd Qu.:5.569787
## Max. :6.907578
## filter_select_per_tournament yval
## Min. :0.001612 Min. :-0.9647
## 1st Qu.:1.000000 1st Qu.:-0.5923
## Median :1.000000 Median :-0.5377
## Mean :1.086512 Mean :-0.5646
## 3rd Qu.:1.228722 3rd Qu.:-0.5189
## Max. :2.397413 Max. :-0.4690
summary(lcbenchBest)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## Min. :-1.7503 Min. :0.000095 bohblrn: 130 FALSE: 731
## 1st Qu.:-1.0406 1st Qu.:0.170492 knn1 : 796 TRUE :1412
## Median :-0.3780 Median :0.332510 knn7 :1161
## Mean :-0.3321 Mean :0.381662 ranger : 56
## 3rd Qu.: 0.3890 3rd Qu.:0.523938
## Max. : 1.0195 Max. :0.999789
## filter_factor_first random_interleave_fraction random_interleave_random
## Min. :0.004248 Min. :0.0000964 FALSE:1020
## 1st Qu.:3.643269 1st Qu.:0.1208691 TRUE :1123
## Median :4.845318 Median :0.2392768
## Mean :4.546724 Mean :0.3170039
## 3rd Qu.:5.870564 3rd Qu.:0.4727989
## Max. :6.907525 Max. :0.9979292
## sample filter_factor_last filter_algorithm
## bohb :2130 Min. :0.004248 progressive: 798
## random: 13 1st Qu.:3.101750 tournament :1345
## Median :4.634717
## Mean :4.263191
## 3rd Qu.:5.721979
## Max. :6.907525
## filter_select_per_tournament yval
## Min. :0.002426 Min. :-0.5160
## 1st Qu.:1.000000 1st Qu.:-0.5126
## Median :1.000000 Median :-0.5082
## Mean :1.064477 Mean :-0.5047
## 3rd Qu.:1.101817 3rd Qu.:-0.4995
## Max. :2.396205 Max. :-0.4690
surrogate_learner: Many “bohblrn” and “rangers” were kicked out in disproportionate numbers. This could mean that these learner perform worse on average. filter_with_max_budget: In proportion more “FALSE” were filtered out. This could means that “TRUE” values perform better on average. We can see that only 13 rows of the the best 20% configurations have “random” sampling. The other (over 2100) instances have used “bohb” sampling. That is also the reason why the parameter sample has no importance for the subdivided dataframe since there are barely configurations samples with the factor “random” left.
The hyperparameter will be examined in following sections more precise.
As we could notice, sample is the most important parameter in the full dataset. This parameter should have the right value to perform well. So let’s look at the effect of the variables in a PDP. We also check if the effect applies to all parameters. We can use a Heatmap to get a quick overview of the interactions. Values close to 1 have hardly any effect on the result.
plotPartialDependence(lcbenchTask, features = c("sample"), rug = FALSE, plotICE = FALSE)
subplot(
plotHeatmap(lcbenchTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)
PDP: It can be seen that the target values for “bohb” samples lead always to better results on average than for “random” samples.
Heatmaps: Note that survival_fraciton and random_interleave_fraction may give better results if a lower value is chosen for their parameter. Also, the surrogate_learner knn1 and knn7 seem to give better results. On average, the “bohb” sample is better, but let’s look at the best results and the combination of their instances.
We want to look at only the best configurations and verify that mostly “bohb” samples occur. Therefore we split the dataset into “bohb” and “random” samples.
random <- lcbenchSmashy[lcbenchSmashy$sample == "random",]
bohb <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",]
randomTask <- TaskRegr$new(id = "task_random", backend = random, target = "yval")
bohbTask <- TaskRegr$new(id = "task_bohb", backend = bohb, target = "yval")
We do split the entire dataset for the best configurations because we assume differences between “random” and “bohb” samples because many “random” were filtered out and the parameter lost a lot of importance. For these reasons, we split the dataset and focus primarily on the “bohb” sample in what follows. For the best 20% configurations we focus on “bohb” only.
Let’s check if there are differences in importance for the parameters in the “random” subset and the “bohb” subset.
plotImportance(bohbTask)
plotImportance(randomTask)
The hyperparameter survival_fraction is the most important parameter. Also random_interleave_fraction has high importance for both subsets. The parameters filter_algorithm and random_interleave_random do not seem to be important at all.
Bohb sample: The parameter budget_log_step is now more important. In the first plot, this parameter was not ranked that high. So we can assume that it is very important for this subset. The importance of the other parameters has not changed that much compared to the full data but the hyperparameter surrogate_learner and filter_with_max_budget are more important than for “random” samples.
Random sample: It looks like the right parameter configuration is more important in the “bohb” sample because The parameter importance values are in general higher than in the “bohb” sample. The parameters filter_factor_last and filter_factor_first have a higher importance in the “random” sample.
We could see in the beginning that most of the good results were gained with “bohb” samples. That’s why we will focus on “bohb” samples only from now on. That is, we remove the 13 rows of “random” samples from the underlying data.
bohbBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
bohbBestTask <- TaskRegr$new(id = "bohbBestTask", backend = bohbBest, target = "yval")
The survival_fraction parameter is the most important parameter for both samples of the entire dataset. With a PDP, we can gain better insight into how the parameter should be configured.
plotPartialDependence(bohbTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)
plotPartialDependence(randomTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)
In general, lower values achieve better performance than higher values. For the “bohb” susbet, the best range seems to be between 0.15 and 0.6. This means that too low a value is not so good in this case. For the “random” subset it is almost monotonically decreasing, which means that lower values are always better.
A possibility to find reasons for the structure is to filter the dataset again. For this we can split the data according to the best 20% yval values of the “bohb” samples
plotPartialDependence(bohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE, gridsize = 20)
In this case, higher values seem to be somewhat better. This is surprising, since in the general case low values were more important. It could mean that with good configurations of other parameters, the survival_fraction parameter even gives better results when a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75. Looking at the rug, we see that most configurations were made below 0.5 and the fewest configurations were made above 0.75. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the average curve. However, the difference on the y-axis is only small and therefore it cannot be said that high values are better.
Another important parameter for “bohb” subset is the surrogate_learner.
plotPartialDependence(bohbTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)
In this graphic, knn1 and knn7 seem to be the best choices based on the results so far. For a more detailed analysis, we should divide the data into the individual surrogate_learners again and check if there are difference in the importance of the remaining parameters.
knn1 <- bohb[bohb$surrogate_learner == "knn1",]
knn7 <- bohb[bohb$surrogate_learner == "knn7",]
bohblrn <- bohb[bohb$surrogate_learner == "bohblrn",]
ranger <- bohb[bohb$surrogate_learner == "ranger",]
knn1Task <- TaskRegr$new(id = "knn1Task", backend = knn1, target = "yval")
knn7Task <- TaskRegr$new(id = "knn7Task", backend = knn7, target = "yval")
bohblrnTask <- TaskRegr$new(id = "bohblrnTask", backend = bohblrn, target = "yval")
rangerTask <- TaskRegr$new(id = "rangerTask", backend = ranger, target = "yval")
plotImportance(knn1Task)
plotImportance(knn7Task)
plotImportance(bohblrnTask)
plotImportance(rangerTask)
The parameter survival_fraction is very important for the “bohblrn” and “knn1” subset. This could already be seen in the PDP for survival_fraction. The hyperparameter random_interleave_fraction has high importance for all surrogate_learners. For the factor “knn7” the parameter budget_log_step seems to be more important than for other factors of the surrogate_learner parameter. To check why the importance differs and whether the parameters have different good ranges, let’s take a closer look at 3 very important parameters. We use ICE curves here to gain further insight. Later we check each factor separately for the top 20% of the configuration to find differences.
plotPartialDependence(knn1Task, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(knn7Task, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(bohblrnTask, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(rangerTask, "random_interleave_fraction", plotICE = FALSE)
For “knn1”, lower random_interleave_fraction values seem to be better. For “knn7” and “bohblrn”, the random_interleave_fraction values should be neither too high nor too low, and for “ranger”, higher values lead to better yval results. A good range for “bohblrn” seems to be between 0.05 and 0.65. For knn1 a value between 0.05 and 0.5 seems good. A good range for “knn7” seems to be between 0.25 and 0.75
plotPartialDependence(knn1Task, "survival_fraction", plotICE = FALSE)
plotPartialDependence(knn7Task, "survival_fraction", plotICE = FALSE)
plotPartialDependence(bohblrnTask, "survival_fraction", plotICE = FALSE)
plotPartialDependence(rangerTask, "survival_fraction", plotICE = FALSE)
Low value for survival_fraction are better in general and could be set to under 0.5 but high values are worst for the “boblrn”. For the surrogate_learner “knn7” values around 0.5 seems to produce best performanes, for the factor “knn1” a good choice is between 0.1 and 0.6. For for all other factors values under 0.5 are better.
plotPartialDependence(knn1Task, "budget_log_step", gridsize = 40, plotICE = FALSE)
plotPartialDependence(knn7Task, "budget_log_step", gridsize = 40, plotICE = FALSE)
plotPartialDependence(bohblrnTask, "budget_log_step", plotICE = FALSE)
plotPartialDependence(rangerTask, "budget_log_step", plotICE = FALSE)
It is very interesting that the line for the parameter budget_log_step shows repeated dips. It is only for the factors “knn7” and “knn1”. The range is hard to identify since it also depends on the gridsize of the plot. It can be said that a value over -0.5 is a good choice knn7 and “ranger.” For “bohb” there are repeated dips but a value should be over -0.5. For “knn1” and “knn7” values bewteen -0.5 and 1 seems to achieve good results.
We also want to invest the best cases and for this directly check the subdivided datasets. For this we will search and analyze the most important parameters with the Importance Plot. In addition, we will examine abnormalities in the PCP in more detail and also look on some summaries.
plotPartialDependence(bestTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)
the factor “bohblrn” of the surrogate_learner parameter is now most important, and the factor “ranger” is cleary more important now.
Lets investigate the surprising outcome of surrogate_learner class “bohblrn”
bohblrnBest <- bohbBest[bohbBest$surrogate_learner == "bohblrn",]
bohblrnTaskBest <- TaskRegr$new(id = "bohblrnTask", backend = bohblrnBest, target = "yval")
plotParallelCoordinate(bohblrnTaskBest, labelangle = 10)
plotImportance(bohblrnTaskBest)
PCP: A high value for the filter_factor_last parameter could be better since there a lot of lines + reach high yval values. The filter_with_max_budget parameter should be set to “TRUE” and the parameter filter_algorithm should be set to “progressive”. It looks like high budget_log_step achieve best results. The parameter filter_factor_first should be restricted.
Importance Plot: In the genereal case for bohblrn survival_fraction was most important (by far!), now it is budget_log_step and filter_with_max_budget.
Let’s investigate why the survival_fraction parameter lost in importance.
plotPartialDependence(bohblrnTaskBest, "survival_fraction")
plotPartialDependence(bohblrnTask, "survival_fraction")
Before a high survival_fraction led to a drop, but one can see that it doesn’t effect very good results! Here we can see why as an addition to the PDP, ICE Curves can be useful as well.
Let us observe the other impotant parameter from PCP and Importance Plot for the “bohblrn” of surrogate_learner.
plotPartialDependence(bohblrnTaskBest, "budget_log_step", gridsize = 30, plotICE = FALSE)
plotPartialDependence(bohblrnTaskBest, "filter_with_max_budget")
plotPartialDependence(bohblrnTaskBest, "filter_factor_last", plotICE = FALSE)
plotPartialDependence(bohblrnTaskBest, "filter_algorithm")
summary(bohblrnBest$filter_algorithm)
## progressive tournament
## 63 54
summary(bohblrn$filter_algorithm)
## progressive tournament
## 278 590
In general budget_log_step perform better with higher values but worse prediction do barely increase with higher configuration values. There are also little drops around -0.3 to 0.5.
Filter_with_max_budget should be set to “TRUE”. There are more observations left than in the subset with factor “FALSE”. In proportion, more “FALSE” have already been thrown out and therefore this is another indication that “TRUE” is the choice for better yval.
The Parameter filter_factor_last high values could perform results best because even the the differences are low there are more observations than on other ranges. A good choice for a configuration is over 5.
The thesis that filter_algorithm should be “progressive” can be confirmed. The Partial Dependence Plot doesnt show it but a lot of tournament got filtered out.
Lets investigate the surprising outcome of surrogate_learner class bohblrn
knn1Best <- bohbBest[bohbBest$surrogate_learner == "knn1",]
knn1BestTask <- TaskRegr$new(id = "bohblrnBestTask", backend = knn1Best, target = "yval")
plotParallelCoordinate(knn1BestTask, labelangle = 10)
plotImportance(knn1BestTask)
PCP: The parameter filter_with_max_budget should set to “TRUE”. It looks like there a specific ranges for budget_log_step which brings better results. The hyperparameter survival_fraction should be high and the parameter. random_interleave_fraction should be low for good results. High filter_factor_last values could be better since there a lot of lines + results in high yval values. The parameter filter_select_per_tournament should be set to 1.
Importance Plot: The paramter filter_factor_first and survival_fraction and filter_factor_last. are most important according to Importance Plot.
The interesting parameter according to PCP and Importance Plots should be examined.
plotPartialDependence(knn1BestTask, "filter_factor_first", plotICE = FALSE )
plotPartialDependence(knn1BestTask, "survival_fraction", plotICE = FALSE)
plotPartialDependence(knn1BestTask, "filter_factor_last", plotICE = FALSE)
plotPartialDependence(knn1BestTask, "filter_with_max_budget")
plotPartialDependence(knn1BestTask, "budget_log_step", plotICE = FALSE)
plotPartialDependence(knn1BestTask, "filter_select_per_tournament", plotICE = FALSE)
plotPartialDependence(knn1BestTask, "random_interleave_fraction", plotICE = FALSE)
In general the parameter filter_factor_first seems to produce better results in low ranges but best results are in configuration ranges under 4. The variable survival_fraction should get a vlue over 0.5 (interesting because in the general case lowe values were better!). The hyperparameter filter_factor_last and random_interleave_fraction doesn’t really tell us where the best configurations are.
knn7Best <- bohbBest[bohbBest$surrogate_learner == "knn7",]
knn7BestTaskBest <- TaskRegr$new(id = "knn7Task", backend = knn7Best, target = "yval")
plotParallelCoordinate(knn7BestTaskBest, labelangle = 10)
plotImportance(knn7BestTaskBest)
PCP: filter_algorithm should be “tournament”. filter_factor_first should be around 4. random_interleave_random should be “FALSE”. survival_fraction seems to be a low. The parameter filter_with_max_budget should be set to “TRUE”. The hyperparameter random_interleave_fraction should get a low value and the parameter filter_select_per_tournament should get a value around 1.
Importance Plot: The most important parameters are filter_factor_first, filter_factor_last and budget_log_step.
plotPartialDependence(knn7BestTaskBest, "filter_factor_first", plotICE = FALSE )
plotPartialDependence(knn7BestTaskBest, "filter_factor_last", plotICE = FALSE)
plotPartialDependence(knn7BestTaskBest, "budget_log_step", plotICE = FALSE)
plotPartialDependence(knn7BestTaskBest, "filter_algorithm", plotICE = FALSE)
plotPartialDependence(knn7BestTaskBest, "random_interleave_random")
plotPartialDependence(knn7BestTaskBest, "survival_fraction", plotICE = FALSE)
plotPartialDependence(knn7BestTaskBest, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(knn7BestTaskBest, "filter_select_per_tournament", plotICE = FALSE)
plotPartialDependence(knn7BestTaskBest, "filter_with_max_budget")
The Parameter filter_factor_first should be under 4, budget_log_step produces best values over 0.5 but has not a big impact in general. Again, we don’t see the perfect range for filter_factor_last and random_interleave_fraction. And we can not confirm with certainty that “tournament” are always better. random_interleave_random should be “FALSE”. filter_select_per_tournament should be over 0.5. filter_with_max_budget should be “TRUE”.
Finally, the ranger should be investigated since the average performance for good configurations increased a lot.
rangerBest <- bohbBest[bohbBest$surrogate_learner == "ranger",]
rangerBestTaskBest <- TaskRegr$new(id = "rangerBestTask", backend = rangerBest, target = "yval")
plotParallelCoordinate(rangerBestTaskBest, labelangle = 10)
plotImportance(rangerBestTaskBest)
PCP: budget_log_step should be high. filter_with_max_budget should be “TRUE”.
Importance Plot: The most important parameters are filter_factor_first, filter_with_max_budget and budget_log_step.
plotPartialDependence(rangerBestTaskBest, "filter_factor_first", plotICE = FALSE)
plotPartialDependence(rangerBestTaskBest, "budget_log_step", plotICE = FALSE)
plotPartialDependence(rangerBestTaskBest, "filter_with_max_budget", plotICE = FALSE)
A high budget_log_step and a low filter_factor_first seems produce best performance. For budget_log_step a value over -0.5 seems to be good, for filter_factor_first a value under 2.5 performs best. It needs to be noticed that only around 45 observations are left and so the intepretation is not that clear. The parameter filter_with_max_budget should set to “TRUE”.
Another important parameter for the “bohb” samples is the budget_log_step parameter. Let’s have a look on the PDP.
plotPartialDependence(bohbTask,"budget_log_step", plotICE = FALSE)
plotPartialDependence(bohbBestTask, features = c("budget_log_step"), plotICE = FALSE)
In General the value for budget_log_step should be over -0.5. A high value seems a good choice in the subdivided dataset. However, we could also see before that the parameter varies greatly for the surrogate_learner “knn1” and “knn7” and therefore the parameter is assigned a high importance without it being clear how best to set the parameter.
Random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in the “bohb” sample and in the “random” sample. Slighty more important in “random” sample. Let check this parameter.
plotPartialDependence(bohbTask, features = c("random_interleave_fraction"), plotICE = FALSE)
plotPartialDependence(randomTask, features = c("random_interleave_fraction"), plotICE = FALSE)
For the random_interleave_fraction and the “bohb” sample a good choice is a value which is not too high or too low since they give worst performances. a good value seems to be between 0.1 and 0.7 . For the “random” sample low values bring better performances here.
plotPartialDependence(bohbBestTask, features = c("random_interleave_fraction"), plotICE = FALSE)
In the upper case, there is no bad range at the edges.
The parameter filter_factor_last was less important but a little check is good as well.
plotPartialDependence(bohbTask, "filter_factor_last", plotICE = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_factor_last"), plotICE = FALSE)
The effect is low and should be only chosen according to the surrogate_learner.
plotPartialDependence(bohbTask, features = c("filter_with_max_budget"), rug = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_with_max_budget"), rug = FALSE)
The parameter filter_with_max_budget has a weak effect but should be set to “TRUE”.
This parameter filter_select_per_tournament had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.
plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament"), plotICE = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "survival_fraction"), rug = FALSE, gridsize = 10)
plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_first"), rug = FALSE, gridsize = 10)
plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_last"), rug = FALSE, gridsize = 10)
The effect is weak and maybe comes from the peaks around 1. The parameter value should be probably choosen between 1 or slightly better but the effect shouldn’t effect much.
The parameter filter_factor_first was a very high ranked parameter in the parmameter Importance Plot for top configurations.
plotPartialDependence(bohbBestTask, features = c("filter_factor_first"), gridsize = 20, plotICE = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "filter_factor_last"), rug = FALSE, gridsize = 10)
plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "survival_fraction"), rug = FALSE, gridsize = 10)
plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "budget_log_step"), rug = FALSE, gridsize = 10)
In general lower values for filter_factor_first achieve slightly better performance. But the differences are small and should not lead to a change in considerations made.